Goto

Collaborating Authors

 approximation error


Distributional Off-Policy Evaluation with Deep Quantile Process Regression

Kuang, Qi, Wang, Chao, Jiao, Yuling, Zhou, Fan

arXiv.org Machine Learning

This paper investigates the off-policy evaluation (OPE) problem from a distributional perspective. Rather than focusing solely on the expectation of the total return, as in most existing OPE methods, we aim to estimate the entire return distribution. To this end, we introduce a quantile-based approach for OPE using deep quantile process regression, presenting a novel algorithm called Deep Quantile Process regression-based Off-Policy Evaluation (DQPOPE). We provide new theoretical insights into the deep quantile process regression technique, extending existing approaches that estimate discrete quantiles to estimate a continuous quantile function. A key contribution of our work is the rigorous sample complexity analysis for distributional OPE with deep neural networks, bridging theoretical analysis with practical algorithmic implementations. We show that DQPOPE achieves statistical advantages by estimating the full return distribution using the same sample size required to estimate a single policy value using conventional methods. Empirical studies further show that DQPOPE provides significantly more precise and robust policy value estimates than standard methods, thereby enhancing the practical applicability and effectiveness of distributional reinforcement learning approaches.


Low-Rank Compression of Pretrained Models via Randomized Subspace Iteration

Pourkamali-Anaraki, Farhad

arXiv.org Machine Learning

The massive scale of pretrained models has made efficient compression essential for practical deployment. Low-rank decomposition based on the singular value decomposition (SVD) provides a principled approach for model reduction, but its exact computation is expensive for large weight matrices. Randomized alternatives such as randomized SVD (RSVD) improve efficiency, yet they can suffer from poor approximation quality when the singular value spectrum decays slowly, a regime commonly observed in modern pretrained models. In this work, we address this limitation from both theoretical and empirical perspectives. First, we establish a connection between low-rank approximation error and predictive performance by analyzing softmax perturbations, showing that deviations in class probabilities are controlled by the spectral error of the compressed weights. Second, we demonstrate that RSVD is inadequate, and we propose randomized subspace iteration (RSI) as a more effective alternative. By incorporating multiple power iterations, RSI improves spectral separation and provides a controllable mechanism for enhancing approximation quality. We evaluate our approach on both convolutional networks and transformer-based architectures. Our results show that RSI achieves near-optimal approximation quality while outperforming RSVD in predictive accuracy under aggressive compression, enabling efficient model compression.


Influence Malleability in Linearized Attention: Dual Implications of Non-Convergent NTK Dynamics

Miñoza, Jose Marie Antonio, Medina, Paulo Mario P., Ibañez, Sebastian C.

arXiv.org Machine Learning

Understanding the theoretical foundations of attention mechanisms remains challenging due to their complex, non-linear dynamics. This work reveals a fundamental trade-off in the learning dynamics of linearized attention. Using a linearized attention mechanism with exact correspondence to a data-dependent Gram-induced kernel, both empirical and theoretical analysis through the Neural Tangent Kernel (NTK) framework shows that linearized attention does not converge to its infinite-width NTK limit, even at large widths. A spectral amplification result establishes this formally: the attention transformation cubes the Gram matrix's condition number, requiring width $m = Ω(κ^6)$ for convergence, a threshold that exceeds any practical width for natural image datasets. This non-convergence is characterized through influence malleability, the capacity to dynamically alter reliance on training examples. Attention exhibits 6--9$\times$ higher malleability than ReLU networks, with dual implications: its data-dependent kernel can reduce approximation error by aligning with task structure, but this same sensitivity increases susceptibility to adversarial manipulation of training data. These findings suggest that attention's power and vulnerability share a common origin in its departure from the kernel regime.


Improved Particle Approximation Error for Mean Field Neural Networks

Neural Information Processing Systems

Recent works (Chen et al., 2022; Suzuki et al., 2023b) have demonstrated In this work, we improve the dependence on logarithmic Sobolev inequality (LSI) constants in their particle approximation errors which can exponentially deteriorate with the regularization coefficient. One may consider adding Gaussian noise to the gradient descent to make the method more stable.






Solving Zero-Sum Markov Games with Continuous State via Spectral Dynamic Embedding Chenhao Zhou

Neural Information Processing Systems

In this paper, we propose a provably efficient natural policy gradient algorithm called Spectral Dynamic Embedding Policy Optimization ( SDEPO) for two-player zero-sum stochastic Markov games with continuous state space and finite action space. In the policy evaluation procedure of our algorithm, a novel kernel embedding method is employed to construct a finite-dimensional linear approximations to the state-action value function.